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Abstract 



A fundamental problem in distributed computing is the task of cooperatively executing a given set 
of t tasks by p processors where the communication medium is dynamic and subject to failures. The 
dynamics of the communication medium lead to groups of processors being disconnected and possibly 
reconnected during the entire course of the computation. The primary objective in this scenario is for 
the group of p processors to compute all the t tasks while minimizing the total work done [3]. In the 
partitionable network paradigm, work is defined as the total number of tasks performed (counting multi- 
plicities) by all the processors during the course of the computation. In [5], the authors consider such a 
partitionable network scenario and analyze a simple randomized scheduling algorithm for the case where 
the tasks to be completed are independent of each other. In this paper, we study a natural generalization 
of this problem where the tasks have dependencies among them defined by a task dependency graph. In 
particular, we consider task dependency graphs that are fc-partite task graphs. Such task dependency 
graphs have been studied extensively in performing dependency analysis of PRAM algorithms. Specifi- 
cally, we present a simple randomized algorithm for p processors cooperating to perform t known tasks 
where the dependencies between them are defined by a fc-partite task dependency graph and additionally 
these processors are subject to a dynamic communication medium. By virtue of the problem setting, 
we pursue competitive analysis where the performance of our algorithm is measured against that of 
the omniscient offline algorithm which has complete knowledge of the dynamics of the communication 
medium. We present a randomized algorithm whose competitive ratio is dependent on the dynamics of 
the communication medium viz. the computational width defined by [5] and also on the nature of the 
dependencies among the t tasks characterized by the task graph. 

Key words: On-line algorithms, distributed computing, randomized algorithms, competitive analysis, 
partitionable networks 

1 Introduction 

A fundamental problem in distributed computing is the problem of cooperatively executing a given set of tasks 
in a dynamic setting. This problem has been studied in different settings, for instance, in message-passing 
models [TJ [21 [3] and in partitionable network models [3J [7] . The challenge is to minimize the total work done 
and to maintain efficiency in the face of dynamically changing processor connectivity. In the partitionable 
network paradigm, work is defined as the total number of tasks performed (counting multiplicities) by all 
the processors during the course of the computation [3] . 

In this scenario, we are given a set of t tasks that must be completed in a distributed setting by a set of p 
processors where the communication medium is subject to failures. We assume that the t tasks are similar, 
in that they require the same number of computation steps to finish execution. We further assume that the 
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tasks are idempotent - executing a task multiple times has the same effect as a single execution of the task. 
The tasks have a dependency relationship defined among them captured by a task dependency graph. If 
task B depends on task A, B cannot be performed before A. 

The dynamics of the communication medium determine a processor's ability to communicate with other 
processors. Effectively, this partitions the processors into groups. Processors that can communicate with 
each other are said to belong to the same group. No communication is possible between processors in different 
groups. Each processor of a group is aware of all the tasks completed by the members of the group. The 
dynamic changes in the communication medium leads to a reconfiguration, i.e. a new partition of processors 
into groups. This new group of processors share knowledge of all the tasks that have been completed among 
them so far and then proceed to continue executing the remaining tasks from their pool of incomplete tasks 
until the next reconfiguration. 

This processor group reconfiguration and task execution may be treated as if they were determined by 
an adversary. Thus, the adversary in our model performs two basic operations: reconfigures the processors 
into groups and also allocates the work quota for each group of processors before the next reconfiguration. 
The work quota is the number of tasks that can be completed by the group before the next reconfiguration 
takes place. While the adversary controls the number of tasks that a group can perform, he does not dictate 
which tasks (the identity of the tasks) the group can perform. 

In this setting, the tasks have dependencies defined among them captured by a directed acyclic task graph 
(i-DAG) which is a fc-partite task graph. Given a group of processors and the tasks known to be completed 
by them, an algorithm in this setting decides on the next incomplete task to be completed by this group. 
Each processor continues to execute tasks from the given set of t tasks until it is aware that all tasks have 
been completed or runs out of it's allocated work limit. Hence, given p processors and t tasks, any algorithm 
must execute at least Q(t ■ p) tasks in the scenario where all the processors are disconnected for the entire 
computation while any reasonable algorithm would only incur 0(t) work in the completely connected case. 
Hence, we treat this problem in an on-line setting and pursue competitive analysis where the performance 
of our algorithm is compared against that of the omniscient offline algorithm which has complete knowledge 
of all the future changes to the communication medium. Our setting is a generalization of the problem in 
O [5] since the tasks are no longer independent but have dependencies defined among them. We show that 
for this setting more pessimistic bounds hold. 

2 Prior Work 

Dwork, Halpern, and Waarts [4] introduced the problem of distributed cooperation for message-passing 
models who also defined task-oriented work. Malewicz, Russell and Shvartsman [7] introduced the notion of 
h- waste that measures the worst-case redundant work performed by h groups (or processors) when started 
in isolation and merged into a single group at some later time. However, all the known solutions were 
limited in the reconfiguration pattern of the dynamic communication medium and only addressed narrow 
special cases. Georgiou, Russell, and Shvartsman [5] performed competitive analysis and showed a simple 
randomized scheduling algorithm RS (Random Select) whose competitive ratio is tight. Their work also 
introduced a notion of computation width, which associates a natural number with a history of changes 
in the communication medium, and shows both upper and lower bounds on competitiveness in terms of 
this quantity. Specifically, they showed that their simple randomized scheduling algorithm obtains the 
competitive ratio (1 + cw/e), where cw is the computation width of the computation pattern determined 
by the dynamics of the communication medium. 

3 Our Results 

We follow on the work done in [5]. We study a natural generalization of the problem where the tasks 
to be completed are not independent of each other but have a fc-partite dependency relationship defined 
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among them. Each partition of the vertices (tasks) of the fc-partite task graph is said to belong to a level. 
Independent tasks belong to the first level, tasks dependent on the first level tasks are at the second level 
and so on. The fc-partite task graphs that we consider in our problem are a special kind of task graphs 
where every task at level k+i is dependent on every task at level k, i = 1, . . . , k — 1 (i.e, complete set of 
directed edges from level U to level h+i, i — 1, . . . , k — 1). We present a simple randomized algorithm for 
p processors cooperating to perform t known tasks where the dependencies between them are defined by a 
fc-partite task dependency graph with processors subject to a dynamic communication medium. We pursue 
competitive analysis and show that pessimistic bounds hold in this case. 

Our algorithm Modified- RS extends the algorithm Random Select(RS) presented in [5]. Modified- RS is a 
simple randomized scheduling algorithm whose competitive ratio depends on the computation width [5] and 
the nature of dependencies among the tasks captured by the task graph. Since one can treat the dynamics 
of the communication medium (the computation pattern) as being adversarially determined, we begin by 
presenting an instance of a computation pattern which lower bounds the expected number of computation 
steps (work) done by any algorithm to perform t tasks on a 2-level bipartite task graph with every task 
at level 2 dependent on every task at level 1. We then show in section 15.31 that algorithm Modified- RS 

is + — i - j\ -competitive for any computational (p, t)-DAG and for a 2-level task t-DAG 

where, cw is the computation width of the computational pattern (p, £)-DAG, a £ (0, 1] denotes the 
fraction of tasks in the first level l\ and c = i + n\ • This competitive ratio matches the lower bound we 
show in section I5.1l and therefore is tight. We then extend our analysis to any fc-level task t-DAG. In section 

15 .51 we show that Modified- RS is ^1 + cw ^(1 — ax) H — +a - \ ^ -competitive for any computational (p, t)- 

DAG and for any fc-level task i-DAG where, aij S (0, 1] and c = -ttj and where a,, i = l..k is a sequence 

defined as follows, ax = 1, Oj+i = ^c ai + a;. Here, a, S (0, 1] is the fraction of tasks at level k, i = 1, . . . , k. 
cw stands for the computation width of the computational (p, t)-DAG and c, > 0. We also show that this 
result is tight as it matches the lowerbound we show in section [5T4l 

When all the tasks given are independent i.e. the task t-DAG has only one level (a = 1) the competitive 
ratio collapses to (1 + cw/e), the bound offered by [5]. Hence, our results subsume the results of [5]. 



4 Model and Definitions 

The problem is defined in terms of p asynchronous processors and t tasks with unique identifiers, initially 
known to all processors. For our purposes the tasks are idempotent and similar, i.e., each task requires the 
same number of computation steps. 

■ k 

Definition 1. A t-DAG is a directed acyclic k-partite graph G = (V, E), where V — U/=i^ — W = {1 • • • 0- 
Edge e — (t\,tj) EE, I = 1, . . . , k — 1, i ^ j if and only if task tj depends on task t\. We write t\ < t 1 ^ 1 
if task t 1 ^ 1 depends on task t\. Here, (J stands for disjoint union. 

We only consider task graphs where a task on level depends on all tasks of level The computa- 
tion pattern i.e., the computational (p,£)-DAG defined below captures the behavior of the adversary that 
determines both the partitioning and the number of tasks allocated to each group of the partition. 

Definition 2. A computational (p,t)-DAG is a directed acyclic graph C — [V, E) augmented with a weight 
function h : V — ► [t] U {0} and a labeling g : V -> 2^ \ {0} so that: 

• For any maximal path P = (v\, v 2 , ■ ■ ■ , ffe) in C, Yli=i — ^- (This guarantees that any algorithm 
terminates during the computation described by the DAG.) 
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• g possesses the following "initial conditions" : 

[P] = U 9(v)- 

v. in{v)—0 

• g respects the following "conservation law": 

There is a function <f> : E — > 2[ p l\{0} so that for each v € V with in(v) > 0, 

g(v)= |J <j>((u,v)), 

(u,v)£E 

and for each v € V with out(v) > 0, 

g(v)= |J 4>((v,u)). 

In the above definition, in(v) and out(v) denote the in-degree and out-degree of v respectively. Finally, 
for the two vertices u, v £ V, we write u < v if there is a directed path from u to v; we then write u < v if 
u < v and u and v are distinct. 




Figure 1: Right: An example of a computational (15,i)-DAG, Left: task i-DAG, ai,2>...,k € (0, 1]. 
Example. As an example, consider the computational (15,i)-DAG shown on Figure [TJ Here we have 

.91 = {Pl,P2,Pz\, .92 = {P4,P5}, 93 = {P6,P7}, 34 = {P8,P5>}, 95 = {PlO,Pll}, 96 = {Pl2, Pl3, Pl4, Pl5>, 
.97 = {Pl.P2.P3.P4.P5}, 58 = {P7,PlO,Pll}, 99 = {Pl,P2,P3,P4,P5,P6}, 5l0 = {PlO,Pll}, 9ll = {Pl\ , 
912 = {Pl,P2,P3,P4,P5,P6,Pll}, 513 = {Pio} and gi4 = {p 7 ,Pl2,Pl3,Pl4,Pl5}- 

Brief Description of the example: This computation pattern models a dynamic communication medium 
with the following characteristics. 
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• The first reconfiguration occurs when groups g± and gi merge to form group 57. The members of group 
<?7 is the union of the processors in groups g\ and gi . Prior to this reconfiguration groups g\ and g^ 
performed exactly 6 and 4 units of work respectively. 

• Group 34 = {ps,P9} runs in isolation for the entire computation and hence does t units of work. 

• Group 33 = {P61P7} splits into groups g% and <?g and performs 7 units of work before the reconfiguration. 
Group gg is a result of a merge by groups gi and processor p 6 of group 33. Similarly, Group g$ is the 
result of a merge by groups g§ and processor p$ of group 93 . 

• Group gs performs 5 units of work before splitting into two groups, gio and gu which proceed to 
perform 2 and 9 units of work respectively before the next reconfiguration (assuming that there are at 
least 2 or 9 tasks remaining respectively, otherwise they would have performed the remaining tasks) 

• Finally, Group 512 is a result of a merge by groups gg and processor pu of group gio. Similarly, Group 
<7i3 contains the processor pio of group gig. Group 514 is the result of a merge by groups gu and ge . 

• The processors in gi2, 313 and gu run until completion with no further reconfigurations. 

Definition 3. Given a computational DAG C = {V,E) and a vertex v € V, we define the predecessor 
graph at v, denoted Pc(v), to be the subgraph of C that is formed by the union of all paths in C terminating 
at v. Likewise, the successor graph at v, denoted Sc{v), is the subgraph of C that is formed by the union 
of all the paths in C originating at v. 

Associated with any directed acyclic graph (DAG) C — (V,E) is the natural vertex poset (V, <) where 
u < v if and only if there is a directed path from u to v. Then the width of C, denoted w(C), is the width 
of the poset (V, <). 

Definition 4. The computation width of a computational DAG C = (V, E), denoted cw(C), is defined as 
cw(C) = max„ 6 v'w(S(?))). 

Let OPT denote the optimal (off-line) algorithm. Wopt(C) and Wr(C) is the work done by the optimal 
algorithm and a randomized algorithm R. 

We treat randomized algorithms as distributions over deterministic algorithms; for a set fl and a family of 
deterministic algorithms {D r | r E f2} we let R = Jl({D r \ r 6 £1}) denote the randomized algorithm where 
r is selected uniformly at random from and scheduling is done according to D r . For a real- valued random 
variable X, we let ~E[X] denote its expected value. We let OPT denote the optimal (off-line) algorithm. 
Specifically, for each C we define Wopt(C) = min^ Wd(C). 

Definition 5. [91 UOj Let 7 be a real valued function defined on the set of all {p 1 t)-DAGs (for all p and t). 
A randomized algorithm R is ^-competitive if for all computation patterns C , 

nw Dr (C)]< 7 (C)W PT(C), 

this expectation being taken over uniform choice of r G f2. 

5 Lower bounds and Algorithm Modified-i?S' 

In this section we give a lower bound on our problem for 2-level task graphs and we present the algorithm 
Modified- RS. We then show that for 2-level task graphs the competitive ratio of Modified- RS is tight. 
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Figure 2: Left: computational (p, i)-DAG, Right: task i-DAG, a € (0,1]. 



5.1 A lower bound for 2-level task graphs 



Theorem 1. Let A be a scheduling algorithm for 2-level task graphs, a be the fraction of tasks at level l\. 
Then, 

a 



W A > 1 + cw (1 - a) + 



WqPT 



Proof. Consider the computation pattern given by the computational (p, i)-DAG and the task t-DAG in 
Figure [21 Initially, the computation pattern has w groups each consisting of a single processor. Let t >> w 
and t mod w = 0. Each processor completes ^ tasks before they are merged and allowed to exchange 
information about completed tasks before being split again into w processors where each processor is allowed 



to complete 



tasks, at this point they are merged again and then split into w processors. For this 



computation pattern the optimal off-line algorithm completes all the t tasks at the formation of the group 
g{U) and accrues exactly t work. Let p C G\ denote the set of — tasks for processor i. We analyze A when 
the tuple P = (Pi, . . . , P w ) is selected uniformly at random among all such tuples. We will show that for 
any algorithm A there is a configuration of the Pi such that 



W A > 1 + (1 - o{l))cw (1 - a) + — 



a 



-e+l 



t 



For a fixed r 6 G\, let p T = Pr[r ^ Pi] then Pr[r ^ (J i Pj] = p™ . Let L$ be the random variable whose 
value is the number of tasks of G\ left undone at the formation of group g(S). 



n\L S \] 



The function x — > x k is convex on the interval [0, oo), so, ^2 T Pr is minimized when the p T are equal. 
Now, 

-=E[|P 1 |]=^Pr[rGP 1 ]= ^ (1 - p T ) 



G 



So > Ered Pr = at - fr = at (1 - and hence, 

E[|is|] > a* ( 1 - - 
\ w 

Let T be the actual number of tasks left undone at the formation of g(S). In lemma [2] we show that with a 
high probability T > at (l - i)™ (1 - o(l)). 

Clearly, at the formation of group g(U), OPT, the optimal off-line algorithm would have finished executing 
all the t tasks. Let ao be such that = T, by choosing a > «o we have that after the first merge and 

reconfiguration the number of tasks not completed from G\ is greater that < " 1 ~^ t and thus the sets of tasks 
that algorithm A chooses for the w processors after g(S) have to be picked from G\, 

For a specific task r in G\ not completed at g(S) (i.e. r £ [T]), let p T — Pr[r ^ Pi] then Pr[r ^ [j i Pi] = p™ . 
Let Ljj be the random variable whose value is the number of tasks of G\ left undone at the formation of 
group g{U). 

E[|i C/ |]=E0[T]-|JP i |]= ^Pr 

r£[T] 

As before, the function x — ► x k is convex on the interval [0, oo), so ^ZtPt * s minimized when the p T are 
equal. Now, 

V-^±±=E[\P 1 \}= Pr[rePi}= J2 V-*) 
w A — ' A — ' 

r£[T] r£[T] 

(!-«)* T f~\ (!-")« 



So, E t£[ t] Pr = T - ^ = T (1 - i^ij and hence, 
EHLdl > T("l- (1 - a)r "' 



1 Y" , , J (l-a)t 
> at 1-- (l-o(l)) 1 



w 



ai(l-i) lu (l-o(l)) / 

As lim w ^<x, (l — ^) = and choosing a > a = we get, 

at 1 



n\Lu\] > (i-o(i))- 



e(l-o(l)) 



(1) 



In particular there must exist selection of the Pi which achieves this bound. Note that after g(U) the 
processors are split again into w processors where they will complete the remaining (1 — o(l))— i_ a 

tasks of Gi and the (1 — a)t tasks of Gi- 
So finally the total work of A is at least: 

1 + (1 - o(l))cw ( (1 - a) + — a 



e+l ^ 

□ 

Note that when the tasks are independent (a = 1) the lower bound is 1 + (1 — o(l))— which matches 
the result of [5] but the lower bound gets more pessimistic as the fraction of independent tasks gets smaller. 

Lemma 2. Pr[\T - E[L<?]| > 4log(t)Vat\ < 
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Proof. Let S be the set of tasks randomly picked by the processors in figure [5] at the formation of the 
group g(S). For 1 < i < at and 1 < J < ^ we define the random variable YJ that takes the value 
of the j th task picked by the processor . Note that in figure [5] each processor picks 2£ tasks. Let 
the function / be the cardinality of the set S and let Xq, . . . ,X at be the sequence of random variables 
such that X, = E[f(S)\Y?,...,Y!] with I = [4r~|, X = E [/(£)] and X at = f(S). This sequence of 
random variables is a martingale and we can use Azuma's inequality to derive the error probability bound. 

Pr[\X a t — Xq\ > Xvat] < 2e^~ . L,s is the random variable whose value is the number of tasks of G\ left 
undone at the formation of group g(S), so E[Ls] = at — Xq and T = at — X a t. Thus for A = 41og(f) 

Pr[\T-E[L s }\ >Alog(t)Val] < 

□ 



5.2 Description of Modified- RS 

We pre-process our task graph to label its nodes with the labeling procedure below. This labeling procedure 
assigns every task at level Zj, the label i. Tasks with no dependencies at level l\ get label 0. A secondary 
result of this labeling procedure is that it transforms any arbitrary task graph into a fc-level task graph 
suitable for our analysis. Modified- RS and its analysis are described formally in the following section. 

Given a task i-DAG G = (V, E) we describe the labeling procedure I : V — ► N that assigns a label to 
every vertex in the following manner: 

• Vv € V s.t. in(v) = 0, l(v) = , i.e., the independent tasks have label 0. 

• Starting with u s.t. l(u) = 0, recursively label the remaining tasks in G in the following manner: if 
(u, v) € E, l(v) = l(u) + 1. If v has already a label i, overwrite i with the new label j only if j > i. 

Modified-i?S 

We are now ready to define Modified- RS (m-RS) where a processor with knowledge that tasks in a set 
K C V have been completed chooses the next task r to be completed at random from V \ K if and only if 
VteV\K, 1{t) <l(t). 



5.3 Analysis of Modified-i?S 

In this section, we analyze the competitive ratio of Modified- RS and we show it's tight by obtaining the 
upper bound of the work performed by our algorithm on any computation pattern (p, i)-DAG and a 2-level 
task i-DAG which matches the lower bound of the previous section. 



5.3.1 Upper Bound for m-RS on a 2-level task DAG 

We start by defining saturated and unsaturated vertices. 

Definition 6. Let C be a computational (p, t)-DAG. Associated with C are the two junctions h : V — ► [i]U{0} 
and g : V -> 2^ \ {0}. For a subgraph C' = (V, E') ofC, let H{C') = J2veV h ( v )- Then > we sa V that a 
vertex v G V is saturated if H(Pc(v)) < t; otherwise, v is unsaturated. Let § denote the set of saturated 
vertices and U the set of unsaturated vertices . Note that if v is saturated, then the group g(v) must complete 
h(v) tasks regardless of the scheduling algorithm used. Along these same lines, if v is an unsaturated vertex 
for which t > ~^2 U<V h(u), the group g(v) must complete at least max(h(v),t — ^2 U<V h(u)) tasks under any 
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scheduling algorithm. A saturated vertex s is li-unsaturated if H(Pc(s)) > at. If v is an unsaturated vertex 
for which ^2 U<V h{u) < t we replace v by a pair of vertices v s and v u and an edge (v s , v u ) such that all edges 
directed into v get directed into v s and all edges directed out of v get directed out v u and h is redefined so 
that h(v s ) — t — ^2 U<V h(u) and h(v u ) = h(v) — h(v s ). Doing this will allow us to have 



In the same way we can have 



unsaturated =>■ h(u) > t 



v h — unsaturated =>■ E] h{u) > at 



We will also use a generalized degree-counting argument shown is [5] 

Lemma 3. Let G = (U, V, E) be an undirected bipartite graph, for a vertex v, let T(v) be the set of vertices 
adjacent to v. Suppose for some A > and for each u 6 U we have J2 v er(u) M w ) — A and that for some 
B > each vertex v £ V we have X}uer(i>) M u ) ^ B then 

Eueu h ( u ) B 
J2 vev h(v) ~ A 



Theorem 4. Algorithm Modified-RS is ^1 + cw ^ 



1-a 



-competitive for any computational 

{p,t)-DAG and for a 2-level task t-DAG. Here, cw stands for the computation width of the computa- 
tional (p, t)-DAG, a £ (0, 1] (a is the fraction of tasks at level 1%) and c = i~j~fn ■ 

Proof. By the definition of an unsaturated/saturated vertex we have Wopt > Sses M s ) an d 

E h ^ ^ f ( 2 ) 

We define T v as the random variable denoting the number of tasks that va-RS completes at vertex v (if v is 
saturated then T v = h(v)). 

Given the (p,t)-DAG C = (V,E) construct the following bipartite graph G = (S,U,E(G)) s.t E(G) = 
{(s,u)\s < u}. Assign the weight K[T V ] to vertex v. By equation^ 



We show that Vs £ S, 



VueU ^ E[T S ] = E h ( s ) ^ l 
ser(u) ser(u) 



Vu £ U E E[T U ] < cw ( 1 - 
uer(s) 



-c+l 



(3) 



(4) 



W, 



m-RS 



E 



E 



5> 



LseS 



E 



E T - 



-ueu 



By linearity of expectation 

w m - RS = E E [ T «] + E E [ T "] 

Note that equations 0] and [3] together with lemma [3] gives 



m-RS 



< 



1 + ctu — a 
1 + cw \ 1 — a 



-c+l 



< 



1 + cw [ 1 — a 



E>pv 

ses 

EM*) 

WoPT 
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as desired. 

Now we show equation|U Consider s £ S a saturated vertex and it's successor graph S(s). S(s) is covered 
by w paths Pi, i — 1 . . . w where w is at most cw. For each path Pi let u® be the first unsaturated vertex 
and s] be the first Zi-unsaturated vertex. Let L s i be the random variable whose value is the set of tasks left 
incomplete by m-RS at the formation of the group g{s}). For a fixed r £ G\ conditioned upon the event 
that t is not yet complete, the probability that t is not chosen by m-RS at a selection point in Pc(s}) is no 
more than (l — . As sj is /i-unsaturated X)u<s 1 M u ) ^ °^ then for each i, 

Pr[r £ L a x] < (l - \) < 1 



at I e 



As there are at tasks in G\, E[|L s o|] < — . As before, we let T be the actual number of tasks left undone at 
vertex S. By the same reasoning as in lemma [51 we see that, 



Pr[\T - E[L s i]\ > 4:log(t)Vat] < 



0(t*) 

Then T < at (- + o(l)J = — with high probability. Now for each m° consider the subgraph Hi of the 

( w \ 

computational (p, i)-DAG defined as Hi = I (J Sc{s)) C\Pc{uf). Given a task t £ G\ conditioned upon 

V ; . . / . 

the event that r is not yet complete, the probability that t is not chosen by m-RS at a selection point in 
H^\ is no more than (1 - ^). As I]{ s i <t ,} n { t ,< tl o} h ( v ) > (1 ~ a K for cacn i 

/ c \ (l-a)t 1 

p r[T e £ „ ; ]<(i--) <-,-. 



So the E[|L u o|] < (1 — a)i H — — Let Xi be the random variable whose value is the number of tasks 

done by m-RS on the portion of Pi consisting of unsaturated vertices. Xj < |i u p| so E[Xj] < E[|L M o|]. By 
linearity of expectation 

E[J2x i ]<aw((l-a)t+^^ I ) 

i e " 

Now every unsaturated vertex appears in some Pi in Sc(s) hence, 

E l T u] < E[^ Xi] < cw(l -a+ )t 

□ 

In the next section we extend the lower bound and upper bound results to any fc-level task i-DAG. 



1 If 3s^s.t. ~}2 h(v) < — ), the competitive ratio is no worse than 1 + cw(l — a). 
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Figure 3: Right: task i-DAG, ai,2,...,fc G (0, 1], Left: An instance of a fc-level computational (p,t)-DAG 



5.4 A Lower bound for /c-level task graphs 

Theorem 5. Let A be a scheduling algorithm for k-level task graphs. Then, 

WA>(l + cw ((1 - ox) + e(1 _ o(1) ^ ea>+B> )) WW 

where a*, i = l..k — 1 is a sequence defined as follows, 
a\ = 1 

a l+ i = (l- (l))§^+a 4 

Proof. Consider the computation pattern given by the computational (p, f )-DAG and the task i-DAG in 
Figure [U Let t >> w and t mod w = 0. Initially, the computation pattern has w processors. For this 
computation pattern the optimal off-line algorithm completes all the t tasks at the formation of the group 
g(Sk) and accrues exactly t work. Let Pj C G\ denote the set of ^ tasks for processor j at level U of the 
computation pattern (i.e. the w processors resulting from the split of group g(Si-i)) . We analyze A when 
the tuple P = (Pi, . . . , P w ) is selected uniformly at random among all such tuples. We will show that for 
any algorithm A there is a configuration of the Pi such that 

W A >(l + cw ((1 - at) + "i fc — 

We first show by induction that the expected number of tasks left undone by A at the formation of group 
g(S k ) is ^ 

E[\Ls h \] > e(1 _ o(1 "k eak+ak 
when linivj^oe (l — i) 1 " = i. This will give us the bound on Wa- 



11 



The base case is shown in theorem [TJ So we assume the result for i — 1 , namely 

ait 



E[\L Si .A}> 



e (l-o(l))^7 L e a - 1 +a-i 



Let Ti_i be the actual number of tasks left undone at the formation of g(Si-i). In lemma[6l we show that 
with a high probability Tj_i > E[\Ls i _ 1 1] (1 — o(l)). As in the proof of theorem[TJ for some otii — 1, ... i — 1 
the number of tasks left undone from Gi at g(Si-i) can exceed the ^ tasks that the w processors at level 
U can complete. 

For a specific task r in G\ not completed at g{Si) (i.e. r € ps^J), let p T = Pr[r ^ Pi] then Pr[r £ 
IJ. = p™. Let be the random variable whose value is the number of tasks of G\ left undone at the 
formation of group g(Si). 

E[|i Si |]=E[|T Si _ 1 |]-|JP i |]= Pr 

As before, the function x — > x k is convex on the interval [0,oo), so X^tPt i s minimized when the p T are 
equal. Now, 

^=E[|i\|]= Yl Pr[reP 1 }= J2 (I-*) 
So, Ere[T Si _j^ = T s,-i - = Tst.! (l - S^f) and hence, 

E[|£ Sj |] > T(l {l ~ ai)t 



As lirriw^oo (l - = ±, we have 



tf[|£*|]> 



e (l-°(l))^e» ! +a ! 



and our induction is complete. 



□ 



Lemma 6. Pr[\T - E[L g (Si)]\ > Alog(t)^/a~t\ < fori = l...k 

Proof. The proof follows exactly the proof of lemma [5] by replacing S by Si and a by a j □ 
5.5 Upper Bound for m-RS on a A>level task DAG 



Theorem 7. Algorithm Modified-RS is + cw f (1 — cki) H — — - J J -competitive for any computa- 
tional (p, t)-DAG and for any k-level task t-DAG where, on € (0, 1] and c — tt^ and where a,, i = l..k is a 
sequence defined as follows, 
a± = 1 

ffli+l = fj-c Q > + a 4 

Proof. A vertex v is /^-unsaturated if J2u<v M w ) — We will proceed as in the proof of theorem IU since 
the reasoning is the same we only need to show that X) M er(s) ^Pu] — cw ( (1 — a i) — ^."1 + „ ^ Let s i 
be the first -unsaturated vertex on Path Pj, We will proceed by showing the theorem by induction on the 
tasks left undone at the formation of the group <?(s?). For j = 1 the result is show in theorem [5] Assume 
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at the formation of group g(s{ 1 ),Tj-i < ^ J _ 1 we will show that at Sj our induction hypothesis 

c °i 3 ~ 

holds. Now as in Section 5.2 consider for each unsaturated vertex it? the subgraph 



ff«=fys (o)nM4 +1 



Given a task r £ G\ not yet complete at the formation of the group g{s\) the probability that r is not 
chosen by Modified- RS at the formation of group g{s{) is [H As J2 s j <v<u° M w ) ^ a jK f° r eacn * 



a* \ {ai)t 1 



Pt\t € Lj] < 1 - — f- | 



e J j-i 

and the expected value of tasks left undone at s\ is less than < aj "i* and thus result follows. 

□ 



6 Conclusions 

We studied the problem of cooperatively performing a set of i-tasks with dependencies in a decentralized 
setting where the communication medium is subject to dynamic changes. We pursued competitive analysis 
and presented a tight upper bound on the competitive ratio of our randomized algorithm Modified- RS on 
fc-level task t-DAG. When the tasks are independent our results subsume the results of [5] and this bound 
is tight for the case of independent tasks. We show that the performance of any scheduling algorithm for 
leveled task graphs depends the computational width that captures the dynamics of the communication 
medium and on the nature of dependencies among the tasks. In particular we show that performance of any 
algorithm in this model can deteriorate as the size of the set of independent tasks reduces. 
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